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Vector-Valued  Support  Vector 


Regression 

■  The  training  data  are  of  the 
form  {(x„  y ")  |f,xe  I",  yel” 

■  y  =  f(x,ft),  f:  1"  — >  lm 

■  Desire  to  minimize  the  loss 
functional  (also  called 
empirical  risk  functional) 

^r(,i)=ZL(y.-f(xi,n)) 

1=1 

■  Parameter  preference 
expressed  by  regularization 
functional 

V{n) 

•  Balance  the  two  as  regulaized 
risk  functional 

Kg  =V(n)  +  CJ(n) 


xel" 


=?(ji)+cEf(y/>f(x/.71)) 


How  should  these  be  defined? 


2 


Parameters  and  estimator 

Scalar  Case:  y  e  M 

Vector  Case:  y  € 

Rreg  =^(«)  +  Zi(X-i(X,^)) 

l=l 

Parameters: 

n  =  {w,^},  wel'.kK 

•  The  weight  w  is  free. 

•  The  bias  b  is  free. 

Estimator  form: 

*«,=p(*)+ZL(y<Kx/>*)) 

/= 1 

Parameters: 

a  =  {W,bj,  WeR^beR’ 

•  The  weight  W  is  free. 

•  The  bias  b  is  free. 

Estimator  form: 

1 

•  The  mapping  (p(-)is  given. 

•  The  estimator  is  linear  in  the 
range  of  tp(-) ! 

•  The  mapping  cp(  )is  given. 

•  The  estimator  is  linear  in  the 
range  of  cp(-) ! 
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Regularization  and 

Scalar  Case:  y  e  M 

Keg  =P(JI)  +  C^|(>',,/(X1>W)) 

i= 1 

Regularization  functional: 

"P(7t)  =  |  Wr\V 

•  The  weight  w  penalized. 

•  The  bias  b  is  not  penalized. 

Loss  Function: 


Loss 

Vector  Case:  y  e  Rm 

Rreg  =P(w)  +  C£|(y,.,f(x,.,Jl)) 

1=1 

Regularization  functional: 

P(n)  =  iTr(WWr) 

•  The  weight  W  penalized. 

•  The  bias  b  is  not  penalized. 

Loss  Function: 


More  to  follow 
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The  Loss  Function  (continued) 


^-insensitive  Loss 

e  -insensitive  loss  function  with  e  =  1  e -insensitive  loss  function  with  e  =  1  e^-insensitive  lossfij|gon  with  e  =  1 


Put  it  a 


e 


But  what  about  the  mapping  <p(-)? 
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Back  to  SVM:  How  to  solve  it 


Scalar  Case:  y  e  M 


Vector  Case:  yet" 


Smooth,  need  £(-,  ) 


Smooth,  need 


Problem: 


Minimize: 


1  e  I 

Rreg  =  T  +  CS|  y>  ~  V/T<f(Xi  )  ~~  b 

^  i=l 


Approach: 

Non-smooth,  need  cp(-) 


Smooth,  need  cp(-) 


Problem: 


Minimize: 

{W,b} 


Rres  =^Tr(WW7)  +  cX||y,-W<p(xi)-b 


Approach: 

Non-smooth,  need  cp(-) 


Smooth,  need  <p(-) 
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The  Primal  Problem 


Scalar  Case:  y 


Minimize: 


ft. §  . 


P  =  |Wrw  +  cX(5  +  ^) 

1=1 

subject  to: 

y,.-Wr<p(x,)-6-£--^  <0 
•  -yi  +  'wT<f(xi)  +  b-e-^  <0 


Vector  Case:  y 


Minimize: 

w,b^AA*}'=1- 


Primal 

Variables 


P  =  iTr(WWr)  +  cX^ 

/=1 

subject  to: 

|8,+8;|p-f-^<0,  £>0 

<  y,- W(p(x,)-b-8l  <0,  8>0 

-y,+W<p(x,)  +  b-S*<0,  8;>0 

8f,  8,*  and  are  slack  variables 

Example:  slack  variable  n 
f  a  +  7T  =  b 


a<b  <=> 


k  >  0 


The  Lagrange  Problem 

Scalar  Case:  y  e  M 

Vector  Case:  y  e  IRm 

Minimize,  Maximize: 

Minimize,  Maximize:  Dual 

V 

w,b,{$,8f,6*jr  (  7,, 7,, 0,,0,[=1  Variables 

L 

=}wrw +c£(5+0 

Z=1 

L4±Tr(WWr)  +  cX^ 

/=1 

c 

-  X «,  (c  +  5  -  >’  +  wr(|> (x, )  +  b) 

i  =  l 

+  X«,(  8,+8*  -e-5-)-X^ 

/=]  i=l 

-  X «*(«•  +  £  +  -  w7  *p  (x, )  -  b) 

i= 1 

-Xy/(y,-w<p(x/)-b-s,) 

i=i 

-±(^,+ri;o 

i= 1 

-  Xv*r  (-?* + w<p  (x; ) + b  - 8* ) 

j=i 

subj  ect  to :  9a*  ,r/i,  rj*  >  0 

-X(0,r8,+e;r8;) 

i=i 

subject  to:  «, ,  77,  >  0,  y,  ,y*  ,0,  ,0*  >  0 
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Minimization  of  L  over  Primals 


Scalar  Case:  y  e  M 


Vector  Case:  ye  R' 


Q J  Q J 

II 

1 

* 

=  0 

3L  _ 
3w 

1 

* 

1 

£ 

=  0 

dL 

C  -  a, -ih 

=  0 

dL 

K  “ 

*  * 

C  ~  a,  ~  ri 

=  0 

Introduce:  j5i  =  o.i  -  a] 


—  =  Ylr-r) 


dL 
3b  : 
dL 

aw 

dL 


=  0 


=  W-£(y,-Y*)<p7  (xj  =0 


1=1 

C-at-  //, 


8,+s; 

«(+«* 


s, +s; 
8,  +  8* 


=  0 


=  0 


*-0* 

l  1 


=  0 


Introduce:  r  =yi-y] 


e  =5-5. 
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Implications  of  Optimization 

Scalar  Case:  y  s  K 

Vector  Case:  y  e 

Equality  Constraint: 

&,= o 

/=i 

Weights: 

■=Z#( p(x<) 

i=l 

Constraints: 

C-ai-r]i  =0  — »  a:<C 
C-a.-tj.=  0  -4  of  <C 

Estimator:  _ . . 

Equality  Constraint: 

tr;=o 

/=l 

Weights: 

m=£r  ,v(x,.) 

;=1 

Constraints: 

C  —  (Xj  —  rjj  =  0  a;  <  C 

Estimator: 

i 

t 

i= 1 

=Zr/WB+b 

i=i  1 1 

The  Dual  Problem 


Scalar  Case:  ye  M 

Regularized  Risk: 

Minimize: 

{w  fb} 

1  £ 

Rreg  =-wTw  +  C2Jy,-w7<|>(x,)-fr£ 
^  1=1 


Dual  Problem: 

Maximize: 

WJ M 

Subject  to: 

£/?=<>.  \fi\ <c 


Solve  for  ft  numerically 


Vector  Case:  ye  K' 


Regularized  Risk: 

Minimize: 

{W,b} 

Rres  =^Tr(WWT)  +  cX|y,-W9(x,)-b 


Dual  Problem: 

Maximize: 

{r,t, 

D  =  -ittr.T 


i 


,=1  y‘=l 


Subject  to: 

tr,=o,  ||r,||  <c 


+  Iy,rr;-^r, 


— +  -  =  1 
p  q 


Solve  for  r?  numerically 
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Half  Way  There 

Scalar  Case:  y  e  M 

Vector  Case:  ye  R” 

Dual  Problem:  Solve  for  ft 

numerically 

Maximize :  D  /5i  j  / 

Dual  Problem:  Solve  for  r, 

numerically 

Maximize:  D^{r  }.  j  \ 

Subject  to:  ^#=0,  A  -c  / 

Subject  to:  ^r,=0,  I\ ^  <  C  J 

Estimator: 

y  =  XA*(*i’x)+fe 

/= l 

Estimator:  — 

y(x)  =  ^ri(xi,x)  +  b 

i=l 

Support  Vectors:  {x, :  ftt  *  0} 

Support  Vectors:  {x, :  r,  *  0) 

The  bias  remains  to  be  found 

The  bias  remains  to  be  found 

Must  Develop  the  KKT  Conditions  to  find  bias 
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KKT  Conditions 

Scalar  Case:  ye  f 


oc,{e  +  ~  yt  +  wr<l>(x, )  +  b)  =  0 

a* (£  +  %  +  y,  - wr<p(x, ) -b)  =  0 
77,4=(C-«,)£  =  0 
rfiZ*  =  (C  -cc*)£  =  0 


Note:  °  denotes  a  parallel  or 
element-wise  product. 
Exponents  are  taken 
element-wise 


Vector  Case:  y  e  Mm 


«,(|8,+5;|p-f-^)=° 

Tj&=(C-al)ti=0 
fi  °(y,-  -  W(p(x;)-b-8;)  =  0 

Y*  °(-y,'  +  W(p(x;)  +  b-5*)  =  0 
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mplications  of  KKT  Conditions 


Vector  Case:  ye  R' 


Primal  +  Dual  +  KKT  conditions  imply 


KKT  Conditions  (in  the  tube) 


Scalar  Case:  y  e  M 


Vector  Case:  y  e  Mm 


y  Strictly  inside  the  £-tube 


KKT  Conditions  (on  the  margin) 


i+i=i 

p  q 


Scalar  Case:  ye  1 


Vector  Case:  y  e 


KKT  Conditions  (out  of  the  tube)  -p+~=\ 


Scalar  Case:  y  e  R  Vector  Case:  y  e  Rm 


Finding  the  Bias 


Scalar  Case:  ye  M 


Maximize:  D  ({/?,}''  j 

f. 

Subject  to:  1^=0,  \p\ <C 


Solve  for  /i 
numerically 


e-,= 


y, 

IXi 

7=1 


X 


Vector  Case:  y  e  Km 


Maximize: 
Subject  to: 


D({cL) 

£r,=o. 


Solve  for  r; 
numerically 


<C 
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Finding  the  bias  ( 1  <  p  <  00 )  ~+~=i 

Scalar  Case:  y  e  R  Vector  Case:  y  e  Km 

For  is  M=  {i:\fi\e  (0,C)}  For  ie  fW={i:||r,.||9e  (0,C)  J 

f  r  Y"1 

Fi-b  =  el=  sign  (P^e  by  KKT  F]-b  =  e,=  — —  sign(r,.)s  by  KKT 

V  f‘  q) 

if  st  =  sign  (/? )  if  a,  =  sign  (T, ) 


it  follows  that  it  follows  that 
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Summary 


Vector  Case:  y  e  Rm 


— +-  =  i 

p  q 


Ibiased  error 


bias  calculation  from  KKT 


Dual  Problem 


Estimato 


Primal  Estimato 


Reaularized  Risk  Minimization 


Solve  for  r; 
numerically 
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Comparison 


Scalar  Case:  ye  I 


Estimator  form: 

y(x)  =  w7  <p(x)  +  b=  'YJpjk(xi,x)  +  b 


ie  S‘V 


Optimization  Problem: 


Maximize: 


_  i 


Z  Z  M*  (*.-  • ) + Z  y<fr  -  eZ  I P,  I 


«=i  ;'=i 


Subject  to:  ZA  =  o,  |£|  <c 


KKT: 


e:  <£ 


e;  -e 


e.  >£ 


fi,=  0 

|A|e(0,C)  -4 
\P\  =  C  -> 


Vector  Case:  y  e  Rm 


Estimator  form: 

y(x)  =  W<p(x)  +  b  =  ^  r,l(x,,x)  +  b 

ie  5'C 

Optimization  Problem: 

Maximize: 

^ ~ + Z  Z  r.'  r/-  .  X; ) + Z  y  ,rr;  -  ^Z  ||r,  | 

/=1  ;=1  (=1  i= 1 

£ 

Subject  to:  =0,  r.  <  C 

i=i 

KKT: 


r,  =o 

-> 

e,. 

<£ 

'  <7 

z  p 

ri<;e(0,C) 

— > 

% 

-£ 

J1 

II 

n 

— > 

>  £ 
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Demonstrations  for/?  =  1,  2, 00 

Vector  Case:  y  e  Mm 


We  will  demonstrate  the  VV-SVR  for  the  following  process 


y 


M-*) 

e0Ax  sinc(x) 

cos(0.1x2) 

1.5 


The  following  RBF  kernel  was  used  with  a  =  1 .  ;i 

k  [xi,x]  ]  =  exp 

50  samples  were  chosen  on  the  interval  [0,  10] 
for  training  and  s  =  0. 1 . 


2  a2 


1 


Vi 


0 

-0.5  0 


Matlab  was  used  as  the 
solver. 


The  1-Norm 


p= 


Vector  Case:  ye  K' 


The  dual  problem: 


Maximize: 


D  =  -TlETrr/(x„x,)+Xy-rr.--^ZI|r, 


Note:  D  is  non-smooth 
in  its  objective  and 
constraint. 


'=1  j=\ 


Subject  to: 

£r-=o,  ||r,L<c 

/=l 

The  dual  problem  after  introduction  of  «•: 

Maximize: 

0  =  -i££Trr^(x.>x;)+£y,rr,-f£«, 

(=i  j= l  /=i  i=i 

Subject  to: 

c 

X r  =  0-  - "1  s r.  <  a I.  a  <C 


Note:  D  is  quadratic  in 
its  objective  and  linear 
in  its  constraint. 


Solve  for  Tf  using  a 
standard  QP  package. 


Demonstration  (1-Norm) 


Vector  Case:  ye  R" 


The  2-Norm 


Vector  Case:  y  e  Rm 


Note:  D  is  non-smooth 
in  its  objective  and 
constraint. 

The  constraint  is 
nonlinear. 


The  dual  problem  after  introduction  of  a-.  Note:  D  is  quadratic  in 

Maximize:  its  objective  and 

D=-iXtTrr/(x„xJ)+Xy,T,  -fZ«,  nonlinear  but  smooth  in 

„  , .  i=l  H  1=1  its  constraint. 

Subject  to: 


ZT  =  o. 


0  <  a,  <  C 


Solve  for  T;  using  a  standard 
nonlinear  programming  package 
which  can  use  gradients. 


Demonstration  (2-Norm) 


Vector  Case:  y  €  Rm 


The  00-Norm 


£_~°° 

<7  =  1 


Vector  Case:  y  e  Rm 


The  dual  problem: 

Maximize: 

D  =  -i££r,rr/(xi>x ,)+ £y,T,  -*£|r( 


(=1  7=1 


/=! 


i=l 


Subject  to: 

£r,  =°>  |r,|,  < c 


Note:  D  is  non-smooth 
in  its  objective  and 
constraint. 


The  dual  problem  after  introduction  of  yi  and  y*\ 


Maximize: 


t  c 


D  =  -iSZ(Y,-r:)  (yJ-t])k(x„xJ) 

,=1  7=1 

+  £  y,r  (  y,  -  y° )  -  e£ir  (y, + y] ) 

1=1  ;= 1 

Subject  to: 

£rf=o,  iT (y;+y:)<c,  y,,y’>o 


Note:  D  is  quadratic  in 
its  objective  and  linear 
in  its  constraint. 

Solve  for  T-  using  a 
standard  QP  package. 


Demonstration  (00-Norm) 


Vector  Case:  y  e  Mm 


Example  Hwang 

■  Hwang  data  set 

H :  [0,  l]2 1->  M.s+ 

■  Artificial  vector¬ 
valued  data  set 

■  Of  Historical 
significance 

■  Input  domain  is 
randomly  sampled 
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48  Support  Vectors 


KKT  Conditions 
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Sparsisity 


Compare  VV-SVR  with  aggregated 
SVR  (libsvm) 

Hwang  data  set  with  2,000  points 
Equal  Volume 

□  VV-SVR  e  =  0.5 

□  SVR  e  =  0.34850 

Support  Vectors 

□  VV-SVR  55 

□  SVR  92 
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Conclusions 


Vector  Case:  ye  1" 


■  VV-SVR  generalizes  the  scalar-valued  case. 

■  Estimator  form  and  parameters 

■  Loss  function 

■  Regularization  functional 

■  VV-SVR  maintains  the  sparsity  of  the  scalar-valued  case. 
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